Column

Image Similarity Examples

General Information

Data

The images were scraped from the fashion retailer C&A. The fashion retailer was chosen because it has an image for each fashion item (Style Color) without any person/model and white background, making the images quite homogeneous. The robots.txt file has been checked to ensure no infringements are done during scraping. The images were scraped with Selenium.

Methodology

Image similarity is a subjective term, as it is difficult to determine similarity in quantitative terms: What makes two fashion items similar? Is it color, fitting, quality, form? Every indivual might value these characterists’ importance differently.

The following methodology approaches this problem in a pragmatic manner: fashion item similarity is defined by the visual characteristics that help to determine the categorization of a fashion product. Yet, it is important to note is that there is no ground truth for product similarity.

Taking into account that we define similarity by the characterstics that help to determine the categorization of a product, categorization algorithms might be helpful to determine a dimensional space for the fashion item images.

In the following case, the pretrained weights of the VGG16 neural network model structure trained on the Imagenet Dataset are used. As we are not interested in the classification of the product, but the dimensionality that determine the cassification, a global average pooling will be applied to the output of the last convolutional block to define a dimensional space for each fashion item.

For further information visit the Github repository